This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 

BEST AVAILABLE IMAGES 



Defective images within this document are accurate representations of 
the original documents submitted by the applicant. 

Defects in the images may include (but are not limited to): 

• BLACK BORDERS 

. TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

. ILLEGIBLE TEXT 

• SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

. BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 

IMAGES ARE BEST AVAILABLE COPY. 

As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



PATENT 
Attorney Docket No. UM/UC-06646 

REMARKS 

Applicants note that all amendments, cancellations, and additions of Claims 
presented herein are made without acquiescing to any of the Examiner's arguments or 
rejections, and solely for the purpose of expediting the patent application process in a 
manner consistent with the PTO's Patent Business Goals (PBG), 1 and without waiving the 
right to prosecute the cancelled claims (or similar claims) in the future. 

In the office action dated 4/22/04, the Examiner made a number of rejections. 
The rejections are listed below in the order in which they are herein addressed. 

(1) Claims 24-27, 33, and 39 stand rejected under 35 U.S.C. 1 12, first paragraph, as 
allegedly lacking enablement; and 

(2) Claims 1, 3-4, 7, 11-12, 24-27, 33, and 38-39 stand rejected under 35 U.S.C. 112, 
second paragraph, as allegedly being indefinite. 

L The Claims are Enabled 

The Examiner has rejected Claims 24-27, 33 and 39 as allegedly lacking 
enablement (Office Action, pg. 3). In particular, the Examiner states the specification 
"does not reasonably provide enablement for a method for calculating a patients risk with 
software." Office Action, pg. 3. The Applicants respectfully disagree. The applicants 
direct the Examiner to the specification at pages 74, line 22 to page 75, line 4, which 
describes bioinformatics methods of the present invention. The applicants further direct 
the Examiner to Examples 9 and 10, pages 1 18-127 and in particular to page 1 19, line 28 
to page 123, line 1, which describes data analysis methods for determining the association 
between Nod2 alleles and Crohn's disease. Indeed, the current state of the art (See e.g., 
the Teng and Risch reference cited on page 120, which is attached to this 
communication), at the time of filing of the present invention provides for the use of 
statistical methods for calculating a subject's risk for disease based on the presence or 
absence of a particular allele. In addition, a variety of commercially available software 



65 Fed. Reg. 54603 (Sept., 8, 2000). 



5 



PATENT 
Attorney Docket No. UM/UC-06646 

programs are suitable for making such statistical calculations and were available at the 
filing date of the present application (e.g., including, but not limited to, Excel (available 
form Microsoft Corporation) and SAS/STAT (available from SAS Corporation)). The 
applicants submit that one skilled in the art, given the teachings in the specification of the 
association between particular Nod2 alleles and disease, the currently available reference 
materials, and commercially available statistics software, would not have been required 
to perform undue experimentation in order to arrive at the presently claimed invention. 
Furthermore, the Examiner has provided no evidence demonstrating lack of enablement 2 . 
The Applicants respectfully request that the Examiner provide such evidence. As such, 
the Applicants respectfully request that the rejection be withdrawn. 

II. The Claims are not Indefinite 

In the office action dated 4/22/04, the Examiner made several rejections under 35 
U.S.C. 112, each of which is addressed in turn below. 

A) Claims 1, 3-4, 7, 11-12 and 38 are Definite 

The Examiner rejected Claims 1, 3-4, 1 1-12 and 38 under 35 U.S.C. 1 12 as 
allegedly being indefinite (Office Action, pg. 7). The Applicants respectfully disagree 
and submit that the claims are definite as written. However, in order to further the 
business interests of the Applicants and while reserving the right to prosecute the original 
(or similar) claims in the future, the Applicants have amended the claims as suggested by 
the Examiner (Office Action, pg. 7). As such, the Applicants respectfully request that the 
rejection be withdrawn. 



2 See e.g., In reMarzocchi, 439 F.2d 220, 224, 169 USPQ 367, 370 (CCPA 1971). "it is incumbent upon 
the Patent Office, whenever a rejection on this basis is made, to explain why it doubts the truth or accuracy 
of any statement in a supporting disclosure and to back up assertions of its own with acceptable evidence or 
reasoning which is inconsistent with the contested statement. Otherwise, there would be no need for the 
applicant to go to the trouble and expense of supporting his presumptively accurate disclosure." 439 F.2d at 
224, 169 USPQ at 370. 



6 



PATENT 
Attorney Docket No. UM/UO06646 

B) Claims 3-4 are Definite 

The Examiner has rejected Claims 3-4 under 35 U.S.C. 1 12 as allegedly being 
indefinite (Office Action, pg. 7). In particular, the Examiner states ? 'it is unclear what a 
genotype relative risk for said subjects encompasses." (Office Action, pg. 7). The 
Applicants respectfully disagree. 

The specification, on page 125, lines 19-21, defines "genotype relative risk" as: 
"The genotypic relative risks (GRR) are defined as the ratio of the marginal penetrance of 
the risk homozygote and heterozygote genotypes to the wild type homozygotes." This 
term is clear in the context of claim 1 in that it describes one exemplary measure of a 
subject's risk of developing Crohn's disease. 

Likewise, the specification, on page 125, lines 25-29 defines "population 
attributable risk" as "The population attributable risk was calculated as (K-Kw)/K, where 
K is the prevalence of Crohn's in the general population and Kw is the prevalence of 
Crohn's in the subpopulation consisting in individuals homozygous for the wild type 
allele at the specified variant." Again, the population attributable risk is one embodiment 
of a subject's risk of developing Crohn's disease. The Applicants submit that the 
specification clearly teaches the meaning of these claim terms. As such, the Applicants 
respectfully request that the rejection be withdrawn. 

C) Claim 7 is Definite 

The Examiner has rejected Claims 3-4 under 35 U.S.C. 112 as allegedly being 
indefinite (Office Action, pg. 8). The Applicants respectfully disagree and submit the 
claim is clear and definite as written. However, in order to further the business interests 
of the Applicants and while reserving the right to prosecute the original (or similar) 
claims in the future, the Applicants have canceled Claim 7. As such, the rejection is 
moot. 

D) Claims 24-27, 33 and 39 are Definite 

The Examiner has rejected Claims 24-27, 33 and 39 under 35 U.S.C. 1 12 as 
allegedly being indefinite (Office Action, pg. 8). In particular, the Examiner states "it is 
unclear whether the calculating the patient's risk is a number value, or a yes/no answer to 



7 



PATENT 
Attorney Docket No. UM/UC-06646 



increased risk." The Applicants respectfully disagree. As described above, the 
specification provides a description of methods for calculating a patient's risk based upon 
the presence or absence of a variant Nod2 allele. Furthermore, software for performing 
the describes statistical methods was well known in the art (See above description of 
enablement of the claims). In addition, the specification provides examples of how a 
subject's risk is presented (See e.g., Example 9, pg. 118). Thus, the specification clearly 
defines the metes and bounds of the claims. The Applicants respectfully submit that the 
claims are clear as written and request that the rejection be withdrawn. 



CONCLUSION 

All grounds of rejection and objection of the Office Action of November 10, 2003 
having been addressed, reconsideration of the application is respectfully requested. It is 
respectfully submitted that the Claims should be allowed. Should the Examiner have any 
questions, or if a telephone conference would aid in the prosecution of the present 
application, Applicant encourages the Examiner to call the undersigned collect at 
608-218-6900. 



Dated: July 20, 2004 



By: 




Tanya A. Arenson 
Registration No. 47,391 



MEDLEN & CARROLL, LLP 
101 Howard Street, Suite 350 
San Francisco, California 94105 
608/218-6900 



8 



I 



The Relative Power of Family-Based 
and Case-Control Designs for Linkage 
Disequilibrium Studies of Complex Human 
Diseases. II. Individual Genotyping 

Jun Teng 1 and Neil Risch 1 - 4 

department of Statistics, Stanford University and Departments of 2 Genetics and 3 Health Research and Policy, Stanford 
University School of Medicine, Stanford, California 94305 USA 

In this paper we consider test statistics based on individual genotyping. For sibships without parents, but with 
unaffected as well as affected sibs, we introduce a new test statistic (referred to as T D ^, which contrasts the allele 
frequency in affected sibs versus that estimated for the parents from the entire sibship. For sibships without 
parents, this test is analogous to the TDT and is completely robust to nonrandom mating patterns. The 
efficiency of the T DS test is comparable to that of the T HS test (which compares affected vs. unaffected sibs and 
was based on DNA pooling), for sibships with one affected child. However, as the number of affected sibs in the 
sibship grows, the relative efficiency of the T DS test versus the T HS test also increases. For example, for sibships 
with three affected,. one-third fewer families are required; for families with four affected, nearly half as many 
are required. Thus, when sibships contain multiple affected individuals, the T DS test provides both an increase in 
power and robustness to nonrandom mating. 



In the first paper in this series, Risch and Teng (1998), 
we considered statistics based on data derived from 
DNA pooling. Only overall allele frequency estimates 
for a pool are available from such experiments; hence, 
only statistics based on pooled allele frequencies are 
possible, such as the haplotype-based haplotype rela- 
tive risk (HHRR) (Falk and Rubinstein 1987; Terwilliger 
and Ott 1992). Such statistics are not automatically 
robust to nonrandom mating, although they are con- 
servative under population stratification. Furthermore, 
such statistics may not extract all the available infor- 
mation in some study designs if individual genotyping 
is performed. Therefore, in this paper we consider 
analyses of data obtained from individual genotyping 
of all study subjects. We compare the same family con- 
stellations as described in Risch and Teng (1998). As 
individual genotyping provides more information 
than DNA pooling, it enables us to improve the statis- 
tical treatment in two ways: by increasing robustness 
and power. 

We consider statistics of the form (p 1 - p 2 )I<J, in 
which the numerator contrasts the estimated allele fre- 
quencies in two groups (affected sibs vs. parents) and 
the denominator is the estimated standard deviation of 
the numerator. Typically, the variance of (p r - is a 
function of genotype frequencies in the parents. When 



Received January 7, 1998; acceped in revised fonn January 20, 1999. 
Corresponding author. 

EMAIL risch@lahmed.stanford.edu; FAX (650) 725-1534. 



DNA pooling has been performed, this variance has to 
be estimated based on the assumption of Hardy- 
Weinberg equilibrium. On the other hand, individual 
genotyping allows us to get an unbiased estimate of the 
variance under more general conditions and thus pro- 
vides further robustness to non-random mating. More 
importantly, in the case where parents are unavailable, 
individual genotyping gives us a greater choice of the 
contrast we can make in the numerator, which poten- 
tially can improve the power of the test. 

Study designs that include affected offspring with 
parents lend themselves to the calculation of a TDT 
statistic, provided individual genotyping is performed. 
Although the TDT offers additional robustness to non- 
random mating in this case, the power of this test sta- 
tistic is generally comparable to that of the HHRR sta- 
tistic, at least when mating is nearly at random. This is 
because the Hardy-Weinberg estimator of parental het- 
erozygosity, used in the denominator of the HHRR sta- 
tistic, is close to the directly counted parental hetero- 
zygosity estimate used in the TDT (Risch and Teng 
1998, formula 4). Thus, sample size requirements using 
individual genotyping for designs involving affected 
offspring with parents, based on TDT, are essentially 
identical to those we have presented previously (Risch 
and Teng 1998) for the same designs based on DNA 
pooling and HHRR statistics (calculations performed 
but not presented). Therefore, we use the sample size 
requirements for affected sibships with parents derived 
in Risch and Teng (1998) for comparison with indi- 
vidually genotyped sibships without parents. 



234 Genome Research 

www.genome.org 



9:234-241 ©1999 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/99 $5.00; www.genome.org 



Family-Based vs. Case-Control Studies II 



In the classic TDT, is the allele frequency in the 
affected child (or children) and p 2 the allele frequency 
in the parents. For sibships without parents, the test 
described in Risen and Teng (1998) proposes p 1 to be 
the allele frequency in the affected sibs, and p 2 the 
allele frequency in the unaffected sibs. When the lo- 
cus-related penetrance is low, the allele frequency p 2 in 
unaffected sibs can also be viewed as providing a nearly 
unbiased estimate of the allele frequency in the parents 
(in this sense, it is similar to the TDT, in which p 2 is the 
observed allele frequency in the parents). When more 
than one child has been individually genotyped, how- 
ever, it is possible to obtain a more efficient estimate of 
the parent allele frequency p 2t as well as an estimate of 
the variance of /) a - p 2 that is robust to nonrandom 
mating. We derive such a statistic below and describe 
its properties. 

We use the same notation as given in Risch and 
Teng (1998); namely, denotes the conditional prob- 
ability of mating type given an affected child (and 
similarly w (r) M for r affected children), in which / and / 
are the number of A alleles in the two parents (we use 
parentheses in subscripts to denote unordered geno- 
types); f k is the ratio of penetrance in individuals with 
k D alleles compared with del individuals; hats over 
letters (circumflexes) denote sample estimates. To sim- 
plify some formulas, we also introduce the following 
notation: 

fz _ t _A_ 

C 21 - f . f ' C 20 ~ 1 ~ C 21 - f , f ' 

Coi-f i + 1 'Coo-l-c 0 i-fj + 1 < 

fz W 1 

Cl2 -f 2 + 2f 1 + l' Cll -f2 + 2f 1 + l' Cl °-f 2 + 2f 1 + l 

We assume, as in Risch and Teng (1998), that unaf- 
fected sibs have a random genotype distribution (low 
penetrance) given the parental mating type. 



Affected-Unaffected Sib Pairs 

We first examine the case of one affected and one un- 
affected sib, without parents. For this case, there are 
nine possible marker genotype outcomes for the sib 
pair, as listed in Table 1, along with their probabilities 
of occurrence. To estimate the frequency of allele A in 
the parents (p^, we notice that under the null hypoth- 
esis, f 2 = fj = 1 and the affected and unaffected sibs 
become symmetric; so Table 1 can be simplified to six 
possible outcomes: (1) Both sibs are AA; (2) both sibs 
are aa; (3) both sibs are Aa; (4) one is AA, the other is 
Aa; (5) one is Aa, the other is aa; and (6) one is AA, the 
other is aa. There are also the same six possible geno- 
type combinations (mating types) for the parents with 
respective probability m (ij) . Because there is an equal 
number of parameters and independent observations, 
maximum likelihood estimates of the parental mating 
type frequencies m (ij) can be calculated by equating the 
sample frequency of each sib-pair outcome with its re- 
spective probability, namely 

n 22 /n = m 22 + m (21) /4 + m n /16 

n (yo /n = moo + th im /4 + 

n ll /n = m i20) + m (21) / 4 + m (10) / 4 + rh x 1 1 4 

n 21 /n + n l2 /n = m (21) /2 + m u /4 

n 10 /n + n m /n = m {l0) /2 + tn ll /4 

n 20 /n + n 02 /n = tn u /8 

Solving these equations, we get the unbiased maxi- 
mum likelihood estimators m ir These are given by 

m (10) = [2(/i 10 + n 01 ) - 4(/7 20 + n 02 )]/n 
W(2i) = [2(h 2 i + "12) - 4(«2o + "02)]/" 
™oo = [ 2 "oo ~ Oho + "01) + ("20 + n 02 )]/2n 
W22 = [2«22 - ("21 + "12) + ("10 + ioi)]/2f? 
^(20) = [2«u - ("21 +"i2) + ("io+"oi)]/2« 



Table 1 . Genotype Outcomes, Scores, and Probabilities for Affected-Unaffected Sib Pair 



Sample 



Affected sib 


Unaffected sib 


frequency 


score (5) 


probability 


AA 


AA 


"22 


0 


m 22 + m (21 / 2 / 2ft + f,) + m n f 2 / 4ft + 2f, + 1) 


AA 


Aa 


"21 


1/4 


"1(21/2 / 2ft + f,) + m n f 2 / 2(f 2 + 2f y + 1) 


AA 


aa 


"20 


1/2 


m u f 2 /4(f 2 + 2f, + \) 


Aa 


AA 


"12 


-1/4 


"Wi/2ft + ^ + "Wi/2ft + 2f 1 + 1) 


Aa 


Aa 


"11 


0 


"Wi / 2 ft + 'i) + "i(20) + ™n'i / 2ft + 2^ + 1) + m 00) f, / 2ft + 1) 


Aa 


aa 


"10 


1/4 


m (l0) f 1 /2(f l + 1) + m 1l f 1 /(f 2 + 2f 1 +1) 


aa 


AA 


"02 


-1/2 


m n /4ft + 2U + 1) 


aa 


Aa 


"01 


-1/4 


m 00) / 2ft + l) + m 11 /2(f 2 + 2^ + 1) 


aa 


aa 


"00 


0 


"i(io) / 2ft + 1) + m n / 4ft + 2f^ + 1) + m 00 



Genome Research 235 

www.genome.org 



Teng and Risch 



Then the frequency of A in the parents can be esti- 
mated by 

Pz = "*22 + 3/ ^(2i) + Vztn u + V2rh {Z0) + 1 /4/7I (10) 

= ["22 + %(*2i + n l2 ) + V 2 (n u + n 20 + n 02 ) 
+ V4n 10 + n 01 )]/n 

which, in this case, is the same as the A allele frequency 
in the combined sibling sample. Because 

P\ - [«22 + "21 + n 2o + V4(n 32 + w n + « 10 )]/n 
we have 

P\-Pi = [("21 - "12) + (>ho - "01) + 2(« 20 - n oz )]/4n. 

The variance of p T - p 2 is a function of /i, the fre- 
quency of heterozygosity in the parents. Whereas DNA 
pooling required us to use the Hardy-Weinberg as- 
sumption in the estimation of h (formula 5 of Risch 
and Teng 1998), individual genotyping allows us to 
obtain a more direct estimate, robust to nonrandom 
mating. Specifically, 

h = m u + V2tn {2l) + V4#ft (10 , 

= ["21 + "12 + "10 + "oi + 4(«20 + n 02 )]/n 

In this case, under the null hypothesis, var^ - p 2 ) = 
h/16n (e.g., this can be calculated from the variance of 
5 in Table 1 using f 2 - f A = 1). Therefore, we can con- 
struct the statistic 

("21 ~ "12 + "10 ~ "01 + 2n 20 - 2n 02 )/4n 
I Ds - == 

V ("21 + «i2 + "10 + "01 + 4" 20 + 4n 02 )/ 16« z 

(1) 

The subscripts on T denote that we do not assume 
Hardy-Weinberg equilibrium and that sibs are used to 
contruct the parent allele frequency. 

To calculate the power of statistic 1, we reformat 
T DS to 

(»zi ~ "12 + »io ~ "01 + 2n 20 - 2n 02 )/yi6Tj 
V(«2i + "12 + "10 + "01 + 4" 20 + 4« 02 )/ 16n 

We assume the denominator converges to its expected 
value (by the Law of Large Numbers), and thus, we 
need only calculate this expectation along with the 
mean and variance of the numerator under the alter- 
native hypothesis. We denote the expectation of the 
square of the denominator as £(a„) and the mean and 
variance of the numerator as yfn v and o^. From Table 1, 

£(ct§) = Vi2[m {2X) + m (m + m u (3f 2 + 2f x + 3)/(f 2 
+ 2/J + 1)] 
v = Vi(w (21) ir 2 , + m (10) 7T 10 + w n 7r n ) 

and u z a = £(ag) - v 2 



Then, the power is given by 

r Affected and s Unaffected Sibs 

By using the same logic described above for one af- 
fected and one unaffected sib, we can construct a sib- 
ship-based disequilibrium test statistic for the general 
case of r affected and 5 unaffected sibs. We classify the 
various outcomes into six groups based on the possible 
matings that could have produced them: (I) All sibs are 
AA; (II) all sibs are aa; (III) all sibs areata; (IV) all sibs are 
either AA or Aa; (V) all sibs are either Aa or aa; and (VI) 
the genotypes AA and aa (and possibly Aa) appear 
among the sibs. These categories are meant to be mu- 
tually exclusive, so that, for example, group IV ex- 
cludes the case of all sibs being AA. In theory, it may be 
possible to obtain additional information by subdivid- 
ing groups IV and V by the number of Aa individuals; 
however, by the above grouping scheme, we are able to 
obtain analytic formulas for power and sample size, as 
described below. We can characterize each possible 
outcome as a vector with the six elements (j 2 , /„ j 0 , k 2 , 
Kv ^o) where /, is the number of affected sibs with i A 
alleles, and k t is the number of unaffected sibs with i A 
alleles. Note that ; 2 + j t + j Q = r, and k 2 + k t + k 0 = s, 
and we define t = r + s. The possible outcomes, by 
group, are listed in Table 2, along with their probabili- 
ties under the alternative hypothesis. Under the null 
hypothesis, the corresponding probabilities can be ob- 
tained by using the population mating-type frequen- 
cies instead of the conditional (on having r affected 
children) mating-type frequencies and substituting in 

To derive the T os statistic, we first sum up the 
probabilities across all possible outcomes within each 
group under the null hypothesis. We obtain the fol- 
lowing totals: 

I: w 22 + w (21) (Vfc) t + rw n (V 4 ) r 
II: w 11 ( 1 /4) t + m (10) (V2) r + Wtx, 

III: m (21) 0/ 2 ) f + m (20) + »!„<%)' + m (10) (y2) f (4) 
IV: m (21) [l - (V2)'" 1 ] + m„[(%/ - {Vzf - (V<)'] 
V: m n [<%)'- (V2)'- + m (10) [l - (V*)'" 1 ] 
VI:m n [l + (V 2 )<-2(y 4 ) f ] 

We denote by n, the number of observations that fall 
into group I and similarly for the other groups. By 
equating the sample frequencies of each group, that is, 
njn, njn, etc., with their respective probabilities, and 
solving the six equations, we can get unbiased maxi- 
mum likelihood estimates of the m (l7 /s under the null 



236 Genome Research 

www.genome.org 



Family-Based vs. Case-Control Studies II 



hypothesis, which are denoted by m {ijy Recalling that 
Pz = m 22 + % ^(21) + Vfc w (20) + V2 m xl + Vi wi (10) , and 
using the maximum likelihood estimates of the 
based on the simplified classification scheme given 
above, we can estimate p 2 by 

Pz = l n \ + ^"ni + 3/4 "iv + ^"v + VfcWvJ/w (5) 

This formula can be easily derived by taking the linear 
combination in equation 5 applied to the formulas in 
equation 4. Then, to obtain p 2 , we can simply assign a 
score S(p z ) of 1, 3/4, 1/2, 1/4, or 0 depending on the 
group membership of the outcome; these scores are 
given in Table 2. 

This derivation is similar to the approach we took 
for the simple case of one affected and one unaffected 
sib. However, in this general case, collapsing all pos- 
sible sibship outcomes (ignoring affection status) into 
the six groups defined above, although unbiased, does 
not use all of the information available. Specifically, 
within group IV there is additional information about 
parental mating type based on the frequency of sib- 
ships defined by the number of AA and Aa sibs. For 
example, in sibships of size 3, this would correspond to 
the relative frequency of sibships with two AA and one 
Aa sib versus those with one AA and two Aa sibs, which 
provides some information on the relative frequency 
of the parental mating type AA x Aa versus Aa x Aa. 
A similar comment applies to group V (for matings 
Aa x aa and Aa x Aa). For the four other sibship 
groups, further subdivision is either not possible 
(groups I, II, and III) or provides no additional infor- 
mation about mating type (group VI, in which the pa- 
rental mating type is automatically Aa x Aa). By not 
further subgrouping groups IV and V, we are able to 
derive formulas for the estimate of p 2 and Var(/5j - p 2 ) 
that are simple and robust and can therefore also per- 
form all power calculations and sample estimates ana- 
lytically. Presumably, there is also some loss of effi- 
ciency in doing so, although much of the information 
about parental-mating type frequencies is contained in 
the relative frequency of groups I to VI. A maximum 
likelihood solution to estimate the parental mating 
type frequencies allowing for subgrouping of groups IV 
and V may be possible by numerical means; however, 
no simple formulas for parameter estimation, power 
calculations, or sample size estimates are possible in 
this case. Furthermore, we demonstrate below in nu- 
merical examples that our simple statistic is more effi- 
cient than one based on comparing the frequency of 
allele A in affected versus unaffected sibs, for sibships 
of size 3 or greater. 

Scores can also be assigned for the estimate of p t . 
To do so, we simply take (j 2 + 1/2; J / r, independent of 
which group contains the outcome. These scores [5(p t )j 
are also given in Table 2. To estimate p, - p 2 , we can 



then assign scores based on the difference in the scores 
S(p x ) and 5(p 2 ); these scores, S(p t - p 2 ), are also given 
in Table 2. As can be seen there, the score is (j 2 - j x ) I 
4r in sibships with only AA and Aa sibs, (j l - j 0 ) / 4r in 
sibships with only Aa and aa sibs, and (j 2 - j 0 ) / 2r in 
sibships with AA and aa sibs. 

In some sense, some of the scoring of sibships, as 
given in Table 2, may seem counterintuitive. Consider 
a sibship of two affected and one unaffected. For 
groups I to III, the uniform scoring of 0 is straightfor- 
ward, as all sibs (affected and unaffected) have the 
same genotype. Now, suppose the two affected sibs 
have genotypes AA and Aa. This sibship will be scored 
the same (0) if the unaffected sib has genotype AA or 
Aa. This is because, in either case, the sibship belongs 
to group IV, and the unaffected child does not change 
the possible mating types of the parents. On the other 
hand, if the unaffected sib is genotype aa, the sibship 
now belongs to group VI and gets a score of +1/2 be- 
cause the parental mating type is Aa x Aa. As another 
example, suppose the two affected sibs have genotypes 
AA and aa. Then the sibship will be scored 0 whatever 
the genotype of the unaffected sib (i.e., AA, Aa, or aa) 
because the sibship automatically belongs to group VI. 
A scoring routine based on the frequency of the A allele 
in the affected sibs versus the unaffected sib would 
score this family differently based on whether the un- 
affected sib was AA, Aa, or aa (e.g., - 1/2 if the unaf- 
fected sib is AA, 0 if Aa, and +1/2 if aa). However, it is 
clear that in the creation of a TDT-type statistic (com- 
paring offspring with parents' allele frequency), in this 
case the unaffected child provides no additional infor- 
mation. 

Under the null hypothesis, E(p 1 - p 2 ) = 0. 
To calculate Var(/>, - p 2 ), we note that p A - p 2 = 
[25,(pi = p 2 )] I n is the average of n independent, iden- 
tically distributed scores, so that Var(p, - p 2 ) = 
MiVar[5(p! - p 2 )\, where the subscript i has been sup- 
pressed. Because E[S(p^ - p 2 )] = 0, we simply calculate 
Var|5(p, - p 2 )\ ^E[lS(p 1 - p 2 )\ 2 \. After some lengthy 
algebra, we obtain 

Var[5(p! - p 2 )] = (m (21) + m il0) )[Vr - (V^/ld 
+ m u [Vr- (V&) f - 

By using logic similar to that used in the derivation of 
p 2 and using the maximum likelihood estimates of the 
m ijf we can estimate this variance by 

V[S( Pl - P 2 )] = &l = Vl6„(H IV + fly) \ 

LI - (Y2) j 

n vl + - y 3 ( 3 /4) f - (MQ f - ( 1 /4)*} 

+ Sn [i _ (yzf-^Jl + (i^) f - 2^)0 

(6) 



Genome Research 237 

www.genome.org 



Teng and Risen 



Thus, the T 1XS statistic, for the general case of r affected 
and s unaffected sibs, is given by 

rp SS/(Pl-P 2 ) 

in which the scores are given in Table 2 and <x 0 by the 
square root of formula 6. Under the null hypothesis, 
the T DS statistic is approximately normally distributed 
with mean 0 and variance 1. 

To calculate the power of this test, we need to de- 
termine v s= E[S(pi - p 2 )] f E(&$), and Vai[S{p 1 - p^] 
under the alternative hypothesis. Then, using the for- 
mulas in Table 2, and after some tedious algebra, we 
obtain the following results: 

v = £[S(Pi - p 2 )] = Hragifc, - c 20 - (Vz) s (c r 21 - c 20 )] 
+ 1 /4wj? 0) [c 01 - Coo - (V$0 5 (Coi " t: oo)] 
+ V4ifi?i[2(c 12 - c 10 ) - (%)'(c 12 + c n ) r 
+ (y<) s ( Cl ! + c 10 ) r - (%)V 12 + (Y 4 )V 10 ] (7) 

r i ^-(v4/" l i 

lo[l — (Vz) J 

+ < 0) [1 - (V2) A (C 01 + Coo)] + <[( 3 / 4 ) 5 (c 12 + c u ) r 
+ {%«c u + c ao ) r - (>/4)V 12 - (V 4 )V 10 - 2(V50V n } 
+ w^ftY, [1 - (V£)' - (%) f + (y 4 )<] + 

x [l -WVn +Cio) r -(%r(c 12 + c„)'+(te)y n ] 

8[l-(V« M Il + <%)'-2<%n (8) 

and 

<j 2 a = Var[S(p,-p 2 )] 

= M&mgjr- 4(r- l)c 21 c 20 - K%Kc 21 + ^o)l 

+ Mfirwiffofr - 4(r - l)c 01 Coo ~ KVz) s (c r m + c^)] 

+ Vier mft{(%) s [r(c 21 + c„r - 4(r - 1) c 12 c n 

(^i2 + c u ) r - 2 + r(c n c 10 ) r -4(r- l)c n c 10 

fru + c lo r 2 - 4(2A + r)c? 0 (c n + c 10 r 2 - 4(rf 2 

+ 2^)c 12 c 10 (c 12 + c n r 2 ] - r(V 2 ) s - l c r u - riKf 

(c r i2 + ^ 0 ) + 4(r - l)(c 12 - c 10 ) 2 + 4(c 12 + c l0 )} 

(9) 

The power can then be calculated using formula 3, sub- 
stituting formulas 7, 8, and 9 for v, £(a 2 0 ), and cr 2 ,,, 
respectively. 

Numerical Results— Individual Genotyping 
vs. Pooling 

Using the power formulas described above, we can cal- 
culate required sample sizes to detect linkage disequi- 
librium. The logic is the same as described in Risch and 
Teng (1998) for sample pooling; again, we use a signifi- 
cance level of 5 x 1CT 8 and 80% power. The required 
sample sizes are given in Table 3. Using the T DS test for 



sibships without parents with individual genotyping 
can produce a significant advantage over the pooled 
statistic (T lfS ), depending on the family structure (com- 
pare with Table 4 in Risch and Teng 1998). For families 
with one affected sib, the sample sizes are roughly 
comparable, with low allele frequencies slightly favor- 
ing the T IXS statistic but high allele frequencies slightly 
favoring the T HS statistic. As the number of affected 
sibs increases, however, the advantage of the T DS sta- 
tistic increases. For two affected sibs, on average (across 
genetic models), 25% fewer families are required; for 
three affected sibs, 35% fewer are needed, whereas for 
four affected sibs, nearly half as many families are nec- 
essary using individual genotyping and the T DS statis- 
tic. As in the case for one affected child, the ratios are 
highest at low allele frequencies. The only exception is 
the high frequency dominant situation, in which the 
T HS test may retain a slight advantage. We note also 
that these conclusions are reasonably independent of 
the number of unaffected sibs used. 

From Table 2 and Table 3 of Risch and Teng 
(1998), we can also contrast the number of families 
required under individual genotyping when both par- 
ents are available versus using two unaffected sibs 
when they are not (giving an identical number of fam- 
ily members). Using two unaffected sibs requires -50% 
more families, roughly independent of the number of 
affected sibs and genetic model. This number can be 
substantially higher, however, for a very common 
dominant allele. 

Combining Families of Different Structure 

As described previously in Risch and Teng (1998), it is 
typical that an investigator will have families of differ- 
ent structure, including different numbers of affected 
sibs and possibly unaffected sibs. As in the case for 
pooled samples, we suggest taking a weighted sum of 
allele frequency differences (p x - p 2 ) for the various 
family structures, in which the weight is according to 
the number affected in the family and the number of 
families of that structure. Thus, for families with r af- 
fected sibs, we multiply (p, - p^ by m H before sum- 
ming, in which n ri is the number of families with r 
affected of structure i, and then divide the total by 
N - Srn n . To obtain the denominator, we simply sum 
r 2 n 2 fi Var(p l - p 2 ), in which the variance of j5, - p 2 for 
a given family structure under the null hypothesis is 
given in the formulas above, divide by N z , and then 
take the square root. 

DISCUSSION 

We have considered test statistics that can be created 
when individual genotyping is performed in nuclear 
families containing affected and unaffected sibs with- 
out parents. We have shown previously that to calcu- 



238 Genome Research 

www.genome.org 



Family-Based vs. Case-Control Studies II 



Table 2. Probabilities of Different Outcomes for r Affected and s Unaffected Sibs and Scores for the T DS Statistic 

Score 



Croup Outcome p 2 Pi - Pz Probability 



1 


(r,0,0,s,0,0) 




+1 


+1 




0 


4'i + 2-HV5 1+ 4-«m«c? 2 




II 


(0,0,r,0,0,s) 




0 


0 




0 


4-W; i ) c!; 0 + 2- 1 m$ )) cSo + mSo 




III 


(0,r,0,0,s,0) 




1/2 


1/2 




0 


2"M2 ) 1) C(2o, + + 2- I /n«c? 1 + 2" 


Sm (10) C 01 


IV 


(/WtA**M) 


(/2 


+ Vty,) / r 


3/4 


Uz 


-/,)/4r 






V 






A / 2r 


1/4 


(h 


-/o)7 4r 


g(y2->«o,cg lC § 0 + 2-M?«Si«fci 




VI 




Uz 


+ %/,) / r 


1/2 


(iz 


- /o) / 2r 







late the TDT for families with parents, individual geno- 
typing is only required for the parents, to obtain a di- 
rect estimate of h. The child allele frequencies can still 
be obtained by DNA pooling, which could lead to a 
significant reduction in genotyping effort, especially 
for larger sib ships. 

Because it is possible to estimate the variance in 
the allele frequency difference between the affected 
and unaffected sibs without the Hardy-Weinberg as- 
sumption in families without parents, estimators that 
are immune to population stratification artifacts can 
be constructed. The statistic we have described, the T DS 
test, is analogous to the TDT because it contrasts allele 
frequencies between parents and affected offspring, as 
in the TDT, and uses a variance estimate independent 
of the Hardy-Weinberg assumption. In this case, the 
parent allele frequencies are estimated from the total 
offspring sibship, including both the affected and un- 
affected offspring. 

When the tested sibship contains only a single af- 
fected, the power of the T DS statistic is quite close to 



the pooled T HS statistic, so the primary advantage of 
the T DS statistic is its robustness. However, as the num- 
ber of affected in the sibship increases, the power of the 
T DS test increases relative to the T HS test, providing an 
additional advantage. We also note that the 

Tds statis- 
tic is easily calculated using the scores given in Table 2 
and its variance by formula 6 above. 

When families with multiple affected sibs are used, 
neither the pooled statistic T HS described in Risch and 
Teng (1998) nor the T DS test described here compare 
favorably in terms of power with tests based on using 
unrelated controls instead of unaffected sibs. Thus, 
strategies involving both family-based as well as unre- 
lated controls may be preferable. 

It may be tempting to use the same group of af- 
fecteds in a two-stage process-— that is, first comparing 
them to unrelated controls to increase power to iden- 
tify candidate loci and then comparing these same af- 
fected individuals to family-based controls (parents or 
unaffected sibs) for robustness. However, in this ap- 
proach, the two tests will be positively correlated under 



Table 3. Number of Sibships Without Parents Required to Detect LD Using Individual Genotyping 

r - 1 r=2 r = 3 r = 4 

s=1 5=2 5=1 5=2 5=2 5=2 



Dominant 














P=0.05 


642 


430 


196 


137 


72 


52 


P=0.20 


455 


312 


250 


173 


147 


155 


P= 0.70 


5,659 


4,254 


6,890 


4,510 


6,414 


9,546 


Recessive 










P = 0.05 


79,709 


61,544 


18,376 


1 3,202 


3,358 


965 


P = 0.20 


2,097 


1,528 


582 


410 


152 


76 


P = 0.70 


443 


297 


254 


178 


152 


160 


Multiplic 














P = 0.05 


2,538 


1,715 


946 


642 


314 


180 


P = 0.20 


870 


604 


421 


286 


177 


125 


P= 0.70 


938 


649 


595 


404 


311 


269 


Additive 














/>=0.05 


1,490 


1,002 


520 


356 


177 


110 


P=0.20 


688 


475 


359 


244 


173 


142 


P= 0.70 


1,393 


980 


959 


647 


505 


42 



(r) Number of affected sibs; (s) Number of unaffected sibs. 



Genome Research 239 

www.genome.org 



Teng and Risen 



the null hypothesis, and so the threshold for signifi- 
cance for the second test needs to be constructed tak- 
ing this correlation into account. 

Other tests of linkage disequilibrium based on sib- 
ships without parents and individual genotyping have 
been proposed. Penrose first suggested the use of un- 
affected sibs as controls in association studies to pro- 
tect against artifactual results owing to population 
stratification (Clarke et al. 1956). The method of CAB. 
Smith (Smith 1961), as also described in Clarke et al. 
(1956), is essentially based on a comparison of geno- 
types in affected children with their unaffected sibs. 
The proposal of Curtis (1997) is similar in this regard. 
Since our paper was submitted, two additional papers 
(Boehnke and Langefeld 1998; Spielman and Ewens 
1998) have appeared describing sibship-based statis- 
tics. These tests are also based on allele (or genotype) 
frequency difference between affected and unaffected 
sibs, similar to the original Smith test. For sibships with 
one affected and one unaffected sib, all of these tests 
(including ours) are equivalent. However, for larger sib- 
ships the tests diverge. 

We have chosen to focus on a TDT-like statistic, 
estimating parental allele and heterozygosity fre- 
quency, as this approach yields a more efficient test for 
sibships with multiple affecteds. However, a critical as- 
sumption underlying this advantage is that unaffected 
sibs reflect a random distribution of parental alleles. 
This will certainly be nearly true whenever the "locus- 
specific" penetrance for the tested locus is low and the 
unaffected sibs are selected randomly. However, this 
statistic would not necessarily be more efficient than a 
statistic based on comparison of allele frequencies in 
affected versus unaffected sibs, when the locus-specific 
penetrance is high or when the unaffected sibs are cho- 
sen from the opposite extreme of a continuous distri- 
bution from which the affecteds are chosen (e.g., lean 
sibs of obese sib pairs) (Eaves and Meyer 1994; Risch 
and Zhang 1995). In this case, the allele frequency in 
unaffected sibs is also expected to deviate from the 
parental allele frequency. The relative efficiency of the 
two types of tests, in this case, will depend on the de- 
gree to which the allele frequency in affected sibs is 
expected to deviate from that in unaffected sibs rela- 
tive to that in the parents, and on the number of un- 
affected sibs. 

At first glance, it may seem mysterious as to why 
the T os statistic has increased efficiency over other sib- 
ship-based statistics that compare affected and unaf- 
fected sibs. These latter statistics are based solely on 
comparisons of genotypes within sibships. However, 
there is additional information available in the sample 
that our statistic incorporates, namely, the relative fre- 
quency of the different sibship genotype constellations 
(ignoring affection status in the sibship). For example, 
for sibships of size 3, we also use the frequency of sib- 



ships with three AA sibs, two AA and one Aa sib, two 
AA and one aa sib, and so on (for all possible genotype 
combinations). This distribution of sibship genotypes 
provides information regarding the frequency of the 
six possible parent mating types. Because the mating- 
type frequencies are estimated without assuming ran- 
dom mating, the estimation procedure is robust to any 
deviation from random mating including population 
stratification. For example, in the extreme stratifica- 
tion case in which half the sibships have three AA sibs 
and the other half three aa sibs, our procedure esti- 
mates half the parent mating types to be AA x AA and 
the other half to be aa x aa, a complete deviation 
from random mating and Hardy-Weinberg genotype 
frequencies. 

The analogy of the T DS statistic to the TDT statistic 
may also seem mysterious if the latter is viewed as a 
statistic derivable only from intact nuclear families. As 
we showed in Risch and Teng (1998), however, the TDT 
is calculated from three components: (1) the frequency 
of allele A in the offspring (p t ); (2) the frequency of 
allele A in the parents (p 2 ); and (3) the frequency of 
heterozygous parents (ft). It is entirely unnecessary to 
have intact families to derive these statistics. For ex- 
ample, p l and p z can be obtained, in theory, by DNA 
pooling, whereby all children are pooled together and 
all parents are pooled together. Even if parent DNA 
samples are separated from their offspring's, a TDT can 
still be calculated. All that is required is knowing that a 
sample is from a child or a parent. Thus, it is obviously 
unnecessary to know which child genotypes are asso- 
ciated with which parent genotypes to construct a TDT. 

In the T DS statistic, we are effectively recreating a 
TDT-type statistic. In this case, however, parental allele 
frequencies and heterozygosity are not estimated di- 
rectly from the parents, who are missing, but from the 
offspring. That this can be done without bias derives 
from the fact that there are at least as many different 
possible sibship genotype constellations as parent mat- 
ing types. 

ACKNOWLEDGMENTS 

This work was supported, in part, by grants from the National 
Human Genome Research Institute (HG00348) and the 
Nancy Pritzker Foundation. We are grateful to Dr. Michael 
Boehnke for many helpful comments and suggestions on this 
manuscript and to Drs. David Curtis and Cedric Clarke for 
pointing out the Clarke et al. reference. 

The publication costs of this article were defrayed in part 
by payment of page charges. This article must therefore be 
hereby marked "advertisement" in accordance with 18 USC 
section 1734 solely to indicate this fact. 

REFERENCES 

Boehnke, M. and CD. Langefeld. 1998. Genetic association mapping 
based on discordant sib pairs: The discordant-alleles test. Am. J. 
Hum. Genet. 62: 950-961. 



240 Genome Research 

www.genome.org 



Family-Based vs. Case-Control Studies II 



Clarke, C.A., J. Wyn Edwards, D.R.W. Haddock, A.W. Howel-Evans, 

R.B. McConnell, and P.M. Sheppard. 1956. ABO blood groups 

and secretor character in duodenal ulcer. Br. Med ]. 2: 725-731. 
Curtis, D. 1997. Use of siblings as controls in case-control 

association studies. Ann. Hum. Genet. 61: 319-333. 
Eaves, L. and J. Meyer. 1994. Locating human quantitative trait loci: 

Guidelines for the selection of sibling pairs for genotyping. 

Behav. Genet. 24: 443-455. 
Falk, C.T. and P. Rubinstein. 1987. Haplotype relative risks: An easy 

reliable way to constnict a proper control sample for risk 

calculations. Ann. Hum. Genet. 51: 227-233. 
Risch, N. and H. Zhang. 1995. Extreme discordant sib pairs for 

mapping quantitative trait loci in humans. Science 

268: 1584-1589. 

Risch, N. and J. Teng. 1998. The relative power of family-based and 
case-control designs for association studies of complex human 
diseases. I. DNA pooling. Genome Res. 8: 1273-1288. 



Smith, C.A.B. 1961. Statistical methods and theory. In Recent 

advances in human genetics (ed. L.S. Penrose), pp. 148-149. J.&A. 
Churchill, Ltd., London, UK. 

Spielman, R.S., R.E. McGinnis, and W.J. Ewens. 1993. Transmission 
test for linkage disequilibrium: The insulin gene region and 
insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 
52: 506-516. 

Spielman, R.S. and WJ. Ewens. 1998. A sibship based test for linkage 

in the presence of association: The sib transmission/disequilibrium 

test. Am. J. Hum. Genet. 62: 450-458. 
Terwilliger, J.D. and J. Ott. 1992. A haplotype-based "haplotype-relative 

risk" approach to detecting allelic associations. Hum. Hexed. 

42: 337-346. 



Received November 9, 1998; accepted in revised form January 20, 1999. 



Genome Research 

www.genome.org 



241 



